Skip to content

add multimodal support for qwen2.5 #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abdulazizab2
Copy link

Issue

Only multi-modal input supported in vllm backend is Llama 3.2

Contribution

  • Add support for qwen2.5 multi-modal input
  • Refactor code to to easily add other multi-modal input models

Copy link

@MrYang1916 MrYang1916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has been tested to work properly and can perform normal inference requests on qwen2.5 multi-mode data.

@MohammedAlkhrashi
Copy link

Thanks, that worked for me

@soulseen
Copy link

@abdulazizab2 is this PR support for Qwen/Qwen2.5-VL-7B-Instruct ?

@abdulazizab2
Copy link
Author

@abdulazizab2 is this PR support for Qwen/Qwen2.5-VL-7B-Instruct ?

It should support all Qwen2.5-VL architectures. It worked for 7B specifically

@soulseen
Copy link

@abdulazizab2 is this PR support for Qwen/Qwen2.5-VL-7B-Instruct ?

It should support all Qwen2.5-VL architectures. It worked for 7B specifically

Is the Triton support request image URL for inference? And I always get an error as follows:

{"error":"Error generating stream: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4"}

My request data is like:

headers = {
    "Content-Type": "application/json"
}

prompt= "what is a usdot number"
img_url = "http:/xxxx.com/xx.jpg"

data = {
    "text_input": "你好",
    # "text_input": "Describe the content of this image.",
    "image": img_url,
    "parameters": {
        "stream": False,
        "max_tokens": 256,
        "temperature": 0.7
    }
}

@abdulazizab2
Copy link
Author

@soulseen

Try this sample request and refine it with your parameters

#!/bin/bash

# Define image URL and local path
image_url="https://upload.wikimedia.org/wikipedia/en/thumb/7/7d/Lenna_%28test_image%29.png/440px-Lenna_%28test_image%29.png"
image_path="lenna.png"

# Download the image
curl -s -o "$image_path" "$image_url"

# Base64 encode the image without newlines
image_base64=$(base64 -w 0 "$image_path")

# Create the JSON payload
payload_file=$(temp)
cat > "$payload_file" <<EOF
{
    "text_input": "Describe these images ?",
    "image": "$image_base64",
    "sampling_parameters": {
        "max_tokens": 256,
        "temperature": 0
    },
    "exclude_input_in_output": true
}
EOF

# Send the POST request
url="http://localhost:8000/v2/models/qwen2.5_vl_3b/generate"
response=$(curl -s -X POST "$url" -H "Content-Type: application/json" -d @"$payload_file")

# Clean up
rm "$payload_file"

# Output the response
echo "$response"

@soulseen
Copy link

@abdulazizab2 thank you for your share, but I also get an error like:

E0717 09:56:34.965292 80494 model.py:507] "[vllm] Error generating stream: type object 'c_python_backend_utils.Logger' has no attribute 'log_warning'"

triton version: tritonserver:25.06-py3

@abdulazizab2
Copy link
Author

@abdulazizab2 thank you for your share, but I also get an error like:

E0717 09:56:34.965292 80494 model.py:507] "[vllm] Error generating stream: type object 'c_python_backend_utils.Logger' has no attribute 'log_warning'"

triton version: tritonserver:25.06-py3

Can you check with the following versions:
triton version: tritonserver:25.01-py3
vllm version: 0.8.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants